{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Sentiment Analysis \n",
    "\n",
    "This notebook demonstrates how to build a sentiment classifier using fastAI NLP pipeline on Azure ML service. \n",
    "\n",
    "Let's import the required Azure ML Packages and defines the needed constants..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "import os\n",
    "import json\n",
    "\n",
    "from azureml.core import (Workspace, \n",
    "                          Experiment,\n",
    "                          RunConfiguration,\n",
    "                          VERSION)\n",
    "\n",
    "from azureml.core.compute import ComputeTarget,AmlCompute\n",
    "from azureml.core.compute_target import ComputeTargetException\n",
    "\n",
    "from azureml.core.conda_dependencies import CondaDependencies\n",
    "from azureml.widgets import RunDetails\n",
    "from azureml.train.dnn import PyTorch\n",
    "from azureml.train.estimator import Estimator\n",
    "\n",
    "from azureml.core.model import InferenceConfig,Model\n",
    "from azureml.core.webservice import LocalWebservice,AciWebservice\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "print(\"SDK version:\", VERSION)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "SUBSCRIPTION_ID = \"\"\n",
    "RESOURCE_GROUP = \"\"\n",
    "WORKSPACE_NAME = \"\"\n",
    "\n",
    "EXPERIMENT_NAME =\"SentimentAnalysis\"\n",
    "CLUSTER_NAME = \"gpucluster\"\n",
    "\n",
    "PROJECT_DIR = os.getcwd()\n",
    "DATASET_DIR = os.path.join(PROJECT_DIR,'data')\n",
    "TRAIN_DIR = os.path.join(PROJECT_DIR,'code','train')\n",
    "INFERENCE_DIR = os.path.join(PROJECT_DIR,'code','score')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initialize Azure ML workspace\n",
    "\n",
    "We initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object to the Azure ML workspace "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ws = Workspace(subscription_id=SUBSCRIPTION_ID, \n",
    "               resource_group=RESOURCE_GROUP, \n",
    "               workspace_name=WORKSPACE_NAME\n",
    "              )\n",
    "    \n",
    "ws.write_config()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Upload dataset to datastore\n",
    "\n",
    "To make data accessible for remote training, we'll upload the dataset to the [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "default_store = default_datastore=ws.datastores[\"workspaceblobstore\"]\n",
    "\n",
    "data_reference = default_store.upload(src_dir=DATASET_DIR,\n",
    "                     target_path='sentiment_analysis', \n",
    "                     overwrite=True,\n",
    "                     show_progress=True)\n",
    "print(data_reference)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initialize Azure ML compute\n",
    "\n",
    "Here we set the remote compute that we'll be used for training, if the cluster name provided is not already provisionned in the workspace, it will be created."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    cluster = ComputeTarget(ws, CLUSTER_NAME)\n",
    "    print(CLUSTER_NAME, \"found\")\n",
    "    \n",
    "except ComputeTargetException:\n",
    "    print(CLUSTER_NAME, \"not found, provisioning....\")\n",
    "    provisioning_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',max_nodes=2)\n",
    "\n",
    "    \n",
    "    cluster = ComputeTarget.create(ws, CLUSTER_NAME, provisioning_config)\n",
    "\n",
    "cluster.wait_for_completion(show_output=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initialize training estimator \n",
    "\n",
    "- We intialize a pytorch estimator and configure the script paramaters with expected arguments\n",
    "- We define the conda environment file with fastAI library.\n",
    "\n",
    "Note, we save the conda file twice one for each directory (training & scoring)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cd = CondaDependencies()\n",
    "#cd.add_pip_package('matplotlib')\n",
    "#cd.add_channel(channel = 'pytorch')\n",
    "#cd.add_channel(channel = 'fastai')\n",
    "cd.add_pip_package('fastai')\n",
    "\n",
    "cd.save_to_file(conda_file_path='env.yml',\n",
    "                base_directory=TRAIN_DIR)\n",
    "cd.save_to_file(conda_file_path='env.yml',\n",
    "                base_directory=INFERENCE_DIR)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "script_params = {'--input_dir':data_reference,\n",
    "                '--lm_lr':5e-3,\n",
    "                '--clf_lr':1e-5,\n",
    "                '--momentum_1':0.9,\n",
    "                '--momentum_2':0.7\n",
    "                }\n",
    "\n",
    "estimator = Estimator(source_directory=TRAIN_DIR,\n",
    "                    script_params = script_params,\n",
    "                    conda_dependencies_file ='env.yml',\n",
    "                    compute_target=cluster,\n",
    "                    entry_script='train.py',\n",
    "                    use_gpu=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create experiment and submit run for execution\n",
    "\n",
    "Now we are ready to start training the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = Experiment(ws, name=EXPERIMENT_NAME)\n",
    "run = experiment.submit(estimator)\n",
    "RunDetails(run).show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download & register model to workspace\n",
    "\n",
    "In the training script, we save the model to the built-in *outputs* folder that Azure ML auto-upload to the run. \n",
    "\n",
    "Here we download the model and register in the workspace"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_path = os.path.join('outputs', 'classifier.pth')\n",
    "run.download_file(model_path, output_file_path=model_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = Model.register(workspace=ws,\n",
    "                       model_name='sa_classifier', \n",
    "                       model_path=model_path,\n",
    "                       description = \"Sentiment analysis classifier\")\n",
    "print(model.name, model.version, sep = '\\t')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Deploy Web service for inference\n",
    "\n",
    "Now we are ready to operationalize the model, AML will proceed with building docker image with the score.py file to serve prediction and the conda environment file for the packages dependencies and deploy the webservice endpoint to Azure container instance.\n",
    "\n",
    "For more information on operationalization in Azure ML https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "inference_config = InferenceConfig(runtime= \"python\", \n",
    "                                   entry_script=os.path.join(INFERENCE_DIR,\"score.py\"),\n",
    "                                   conda_file=os.path.join(INFERENCE_DIR,\"env.yml\")\n",
    "                                  )\n",
    "\n",
    "deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)\n",
    "\n",
    "service = Model.deploy(workspace=ws, \n",
    "                       name=\"sentiment-analysis-image\", \n",
    "                       models=[model], \n",
    "                       inference_config=inference_config, \n",
    "                       deployment_config=deployment_config\n",
    "                      )\n",
    "\n",
    "service.wait_for_deployment(True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Test deployed web service"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "\n",
    "test_sample = json.dumps({'data': [\"That was an awesome experience, I will watch it again!\"]})\n",
    "service.run(test_sample)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:amlenv]",
   "language": "python",
   "name": "conda-env-amlenv-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
